Learning Age and Gender Using Co-occurrence of Non-dictionary Words from Stylistic Variations

نویسنده

  • R. Rajendra Prasath
چکیده

This work attempts to report the stylistic differences in blogging for gender and age group variations using slang word co-occurrences. We have mainly focused on co-occurrence of non dictionary words across bloggers of different gender and age groups. For this analysis, we have focused on the feature use of slang words to study the stylistic variations of bloggers across various age groups and gender. We have modeled the co-occurrences of slang words used by bloggers as graph based model where nodes are slang words and edges represent the number of cooccurrences and studied the variations in predicting age groups and gender. We have used demographically tagged blog corpus from ICWSM Spinner dataset for these experiments and used Naive Bayes classifier with 10 fold cross validations. Preliminary results shows that the concurrence of of slang words could be a better choice for predicting age and gender.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Learning Age and Gender of Blogger from Stylistic Variation

We report results of stylistic differences in blogging for gender andagegroupvariation.Theresultsarebasedontwomutually independent features. The first feature is the use of slang words which is a new concept proposed by us for Stylistic study of bloggers. For the second feature, we have analyzed the variation in average length of sentences across various age groups and gender. These features ar...

متن کامل

A Novel Face Detection Method Based on Over-complete Incoherent Dictionary Learning

In this paper, face detection problem is considered using the concepts of compressive sensing technique. This technique includes dictionary learning procedure and sparse coding method to represent the structural content of input images. In the proposed method, dictionaries are learned in such a way that the trained models have the least degree of coherence to each other. The novelty of the prop...

متن کامل

CO-graph: A new graph-based technique for cross-lingual word sense disambiguation

In this paper, we present a new method based on co-occurrence graphs for performing Cross-Lingual Word Sense Disambiguation (CLWSD). The proposed approach comprises the automatic generation of bilingual dictionaries, and a new technique for the construction of a co-occurrence graph used to select the most suitable translations from the dictionary. Different algorithms that combine both the dict...

متن کامل

Stylometric Analysis of Bloggers' Age and Gender

We report results of stylometric differences in blogging for gender and age group variation. The results are based on two mutually independent features. The first feature is the use of slang words which is a new concept proposed by us for Stylometric study of bloggers. Slang is a non-dictionary word that has evolved with time due to its frequent and popular usage. For the second feature, we hav...

متن کامل

Extracting Word Correspondences from Bilingual Corpora Based on Word Co-occurrence Information

A new method has been developed for extracting word correspondences from a bilingual corpus. First, the co-occurrence infi~rmation for each word in both languages is extracted li'om the corpus. Then, the correlations between the co-occurrence features of the words are calculated pairwisely with tile assistance of a basic word bilingual dictionary. Finally, the pairs of words with the highest co...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2010